Generic 3D Representation via Pose Estimation and Matching

نویسندگان

  • Amir Roshan Zamir
  • Tilman Wekel
  • Pulkit Agrawal
  • Colin Wei
  • Jitendra Malik
  • Silvio Savarese
چکیده

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learnt features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-Assisted 3D Face Reconstruction from Video

This paper describes a model-assisted system for reconstruction of 3D faces from a single consumer quality camera using a structure from motion approach. Typical multi-view stereo approaches use the motion of a sparse set of features to compute camera pose followed by a dense matching step to compute the final object structure. Accurate pose estimation depends upon precise identification and ma...

متن کامل

Pattern Recognition 1996 Invariant Representation , Matching and Pose Estimationof 3 D Space Curves

This paper presents a system for matching and pose estimation of 3D space curves under the similarity transformation composed of rotation, translation and uniform scaling. The system makes use of constraints not only on the feature points but also on the curve segment. A representation called the similarity-invariant coordinate system (SICS) is presented for deriving semi-local invariants of 3D...

متن کامل

Model-Based Head Tracking and 3D Pose Estimation

This paper presents a generic method for addressing the issue of 3D model-based head pose estimation. The method proposed relies on the downhill simplex optimization method and on the combination of motion and texture features. A proper initialization based on a block matching procedure associated with 3D/2D matching depending on texture and optical flow information leads to an accurate recover...

متن کامل

DeepIM: Deep Iterative Matching for 6D Pose Estimation

Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the input image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Giv...

متن کامل

Learning and Evaluating Visual Features for Pose Estimation

We present a method for learning a set of visual landmarks which are useful for pose estimation. The landmark learning mechanism is designed to be applicable to a wide range of environments, and generalized for different approaches to computing a pose estimate. Initially, each landmark is detected as a local extremum of a measure of distinctiveness and represented by a principal components enco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016